##
## The downloaded binary packages are in
## /var/folders/j2/1zcbb1js7r98zvcdd342lc_40000gn/T//RtmpTWGieV/downloaded_packages
##
## The downloaded binary packages are in
## /var/folders/j2/1zcbb1js7r98zvcdd342lc_40000gn/T//RtmpTWGieV/downloaded_packages
##
## The downloaded binary packages are in
## /var/folders/j2/1zcbb1js7r98zvcdd342lc_40000gn/T//RtmpTWGieV/downloaded_packages
##
## The downloaded binary packages are in
## /var/folders/j2/1zcbb1js7r98zvcdd342lc_40000gn/T//RtmpTWGieV/downloaded_packages
## 'data.frame': 1599 obs. of 13 variables:
## $ X : int 1 2 3 4 5 6 7 8 9 10 ...
## $ fixed.acidity : num 7.4 7.8 7.8 11.2 7.4 7.4 7.9 7.3 7.8 7.5 ...
## $ volatile.acidity : num 0.7 0.88 0.76 0.28 0.7 0.66 0.6 0.65 0.58 0.5 ...
## $ citric.acid : num 0 0 0.04 0.56 0 0 0.06 0 0.02 0.36 ...
## $ residual.sugar : num 1.9 2.6 2.3 1.9 1.9 1.8 1.6 1.2 2 6.1 ...
## $ chlorides : num 0.076 0.098 0.092 0.075 0.076 0.075 0.069 0.065 0.073 0.071 ...
## $ free.sulfur.dioxide : num 11 25 15 17 11 13 15 15 9 17 ...
## $ total.sulfur.dioxide: num 34 67 54 60 34 40 59 21 18 102 ...
## $ density : num 0.998 0.997 0.997 0.998 0.998 ...
## $ pH : num 3.51 3.2 3.26 3.16 3.51 3.51 3.3 3.39 3.36 3.35 ...
## $ sulphates : num 0.56 0.68 0.65 0.58 0.56 0.56 0.46 0.47 0.57 0.8 ...
## $ alcohol : num 9.4 9.8 9.8 9.8 9.4 9.4 9.4 10 9.5 10.5 ...
## $ quality : int 5 5 5 6 5 5 5 7 7 5 ...
Structure of the loaded red wine quality dataset
## 'data.frame': 1599 obs. of 13 variables:
## $ fixed.acidity : num 7.4 7.8 7.8 11.2 7.4 7.4 7.9 7.3 7.8 7.5 ...
## $ volatile.acidity : num 0.7 0.88 0.76 0.28 0.7 0.66 0.6 0.65 0.58 0.5 ...
## $ citric.acid : num 0 0 0.04 0.56 0 0 0.06 0 0.02 0.36 ...
## $ residual.sugar : num 1.9 2.6 2.3 1.9 1.9 1.8 1.6 1.2 2 6.1 ...
## $ chlorides : num 0.076 0.098 0.092 0.075 0.076 0.075 0.069 0.065 0.073 0.071 ...
## $ free.sulfur.dioxide : num 11 25 15 17 11 13 15 15 9 17 ...
## $ total.sulfur.dioxide: num 34 67 54 60 34 40 59 21 18 102 ...
## $ density : num 0.998 0.997 0.997 0.998 0.998 ...
## $ pH : num 3.51 3.2 3.26 3.16 3.51 3.51 3.3 3.39 3.36 3.35 ...
## $ sulphates : num 0.56 0.68 0.65 0.58 0.56 0.56 0.46 0.47 0.57 0.8 ...
## $ alcohol : num 9.4 9.8 9.8 9.8 9.4 9.4 9.4 10 9.5 10.5 ...
## $ quality : Ord.factor w/ 6 levels "3"<"4"<"5"<"6"<..: 3 3 3 4 3 3 3 5 5 3 ...
## $ ratings : Factor w/ 3 levels "low","average",..: 2 2 2 2 2 2 2 3 3 2 ...
Structure of the loaded red wine quality dataset after removing column X which is for indexing and unnecessary
A new variable ratings is added to the dataset.
In the dataset, there are 1599 observations and 12 features.
## fixed.acidity volatile.acidity citric.acid residual.sugar
## Min. : 4.60 Min. :0.1200 Min. :0.000 Min. : 0.900
## 1st Qu.: 7.10 1st Qu.:0.3900 1st Qu.:0.090 1st Qu.: 1.900
## Median : 7.90 Median :0.5200 Median :0.260 Median : 2.200
## Mean : 8.32 Mean :0.5278 Mean :0.271 Mean : 2.539
## 3rd Qu.: 9.20 3rd Qu.:0.6400 3rd Qu.:0.420 3rd Qu.: 2.600
## Max. :15.90 Max. :1.5800 Max. :1.000 Max. :15.500
## chlorides free.sulfur.dioxide total.sulfur.dioxide
## Min. :0.01200 Min. : 1.00 Min. : 6.00
## 1st Qu.:0.07000 1st Qu.: 7.00 1st Qu.: 22.00
## Median :0.07900 Median :14.00 Median : 38.00
## Mean :0.08747 Mean :15.87 Mean : 46.47
## 3rd Qu.:0.09000 3rd Qu.:21.00 3rd Qu.: 62.00
## Max. :0.61100 Max. :72.00 Max. :289.00
## density pH sulphates alcohol quality
## Min. :0.9901 Min. :2.740 Min. :0.3300 Min. : 8.40 3: 10
## 1st Qu.:0.9956 1st Qu.:3.210 1st Qu.:0.5500 1st Qu.: 9.50 4: 53
## Median :0.9968 Median :3.310 Median :0.6200 Median :10.20 5:681
## Mean :0.9967 Mean :3.311 Mean :0.6581 Mean :10.42 6:638
## 3rd Qu.:0.9978 3rd Qu.:3.400 3rd Qu.:0.7300 3rd Qu.:11.10 7:199
## Max. :1.0037 Max. :4.010 Max. :2.0000 Max. :14.90 8: 18
## ratings
## low : 63
## average:1319
## high : 217
##
##
##
Above is the summary statistics of the dataset.
## 3 4 5 6 7 8
## 10 53 681 638 199 18
Majority of the wines are in quality 5 and 6. Very few of them have low quality between 3 and 4. There are some of them with high quality between 7 and 8.
Majority of the wines have average quality. Very few of them have low quality. High rated wines are more than low rated but less than average rated wines.
The distribution of the red wines chemical properties values:
Normal: density, fixed.acidity, pH, sulphates, volatile.acidity. Although they all are slightly right-skewed.
Right-skewed: alchohol, citric acid, free sulfur dioxide, sulphates, total_sulfur_dioxide
Highly right-skewed: chlorides, residual_sugar
The boxplot of the red wines chemical properties values show us that residual.sugar, chlorides, sulphates, total.sulfur.dioxide have many outliers. Fixed.acidity, volatile.acidity, citric.acid, free.sulfur.dioxide, pH have few outliers.
This tidy data set contains 1,599 red wines with 11 variables on the chemical properties of the wine.
1 - fixed acidity: most acids involved with wine or fixed or nonvolatile (do not evaporate readily)
2 - volatile acidity: the amount of acetic acid in wine, which at too high of levels can lead to an unpleasant, vinegar taste
3 - citric acid: found in small quantities, citric acid can add ‘freshness’ and flavor to wines
4 - residual sugar: the amount of sugar remaining after fermentation stops, it’s rare to find wines with less than 1 gram/liter and wines with greater than 45 grams/liter are considered sweet
5 - chlorides: the amount of salt in the wine
6 - free sulfur dioxide: the free form of SO2 exists in equilibrium between molecular SO2 (as a dissolved gas) and bisulfite ion; it prevents microbial growth and the oxidation of wine
7 - total sulfur dioxide: amount of free and bound forms of S02; in low concentrations, SO2 is mostly undetectable in wine, but at free SO2 concentrations over 50 ppm, SO2 becomes evident in the nose and taste of wine
8 - density: the density of water is close to that of water depending on the percent alcohol and sugar content
9 - pH: describes how acidic or basic a wine is on a scale from 0 (very acidic) to 14 (very basic); most wines are between 3-4 on the pH scale
10 - sulphates: a wine additive which can contribute to sulfur dioxide gas (S02) levels, wich acts as an antimicrobial and antioxidant
11 - alcohol: the percent alcohol content of the wine
Output variable (based on sensory data):
12 - quality (score between 0 and 10)
Observations: - Most wines have medium quality (quality 5 and 6)
Quality is the main feature. We want to find out the factors which determine quality. There are some explanations in the dataset itself to explain the each chemical properties affect on the quality. For example by the outhors of dataset we know that high volatile acidity can lead to unpleasant taste like vinegar. However as we have done only univariate analysis we don’t know exactly how these chemical properties are related to the quality of wine.
As we have done only univariate analysis we don’t know exactly how these chemical properties are related to the quality of wine.
A rating variable was created based on the quality. The wines with quality less than 5 are accepted as low rating, the wines with quality 5 qnd 6 are associated with the rating medium and the wine quality more that 6 belong to high rating.
Did you perform any operations on the data to tidy, adjust, or change the form
of the data? If so, why did you do this?
Yes, there are some unusual distributions such as chlorides, residual_sugar.
They are highly right skewed.
No, I haven’t done any operations on the data.
Alcohol,sulphates, citric.acid, fixed.acidity, volatile.acidity have high correlations with with wine quality ratings. Ph, density, chlorides have correlations with with wine quality ratings.
Residual.sugar, chlorides, free.sulfur.dioxide do not have significant importantance in determining low and high quality wines.
Alcohol is positively correlated with ph while it is negatively correlated with density. Wines with high volatile.acidity produce low quality wines. Density is positively correlated with fixed.acidity. Fixed.acidity is positively correlated with citric.acid while negative correlated with Ph.
Alcohol, pH, sulphates, density, fixed.acidity, volatile.acidity, citric.acid are the main factors which determine low or high quality wines.
Alcohol has positive correlation with pH, negative correlation with density, total.sulfur.dioxide
PH has negative correlation with volatile.acidity
Density is positively correlated with fixed.acidity
Fixed.acidity is positive correlated with citric.acidity
There are strong relationships between the quality of wine and alcohol and the acidity of the wine; fixed.acidity, volatile.acidity, citric.acid
Alcohol and density are negatively correlated. As the alcohol increase the wine quality increases however low density wines produce high quality wines. High volatile acidity helps us determine low quality wines.
High quality wines have higher sulphates. Here again we see having high volatile acidity produce low quality wines.
Density and fixed.acidity features are positively correlated. Here again we see the high volatile acidity effect on low quality wines.
By multivariate analysis, we are now sure that alcohol is an important chemical property defining the quality of red wine. In the dataset it is stated that high volatile.acidity will be associated with low quality wines. The graphs in this section prove that as the volatile.acidity increase the quality of the red wine decrease. Also, high quality wines have higher sulphates and low density.
Acidity is important in wine quality. High citric.acid and fixed.acidity tend to produce better quality wines as long as volatile.acidity is not high.
Most of the red wines in the dataset have average quality. There are very few wines with low quality but quite few with high quality.
Alcohol, pH, sulphates, density, fixed.acidity, volatile.acidity, citric.acid are the main factors which determine low or high quality wines.
Alcohol and volatile.acidity are important chemical properties defining the quality of red wine. The quality of the wine increase as the alcohol value increase on the contrary the quality of the red wine decrease by the increase of volatile.acidity. Other factors increasing the quality is low density.
The red wine dataset contains 1599 observations with 11 features on the chemical properties. The main feature is the wine quality. We are interested in the chemical property features which determine wine quality. Below are our findings.
1 - Fixed acidity has positive correlation with wine quality unlike volatile acidity.
2 - Volatile acidity is important in determing wine quality and it is negatively correlated to wine quality. In our data analysis, we found out that low quality wines have high volatile density.
3 - Citric acid is positively correlated to wine quality unlike volatile acidity. Our data analysis shows that wines quality increase with citric acid increase.
4 - Residual sugar is not effective in determining the wine quality.
5 - Chlorides is not effective in determining the wine quality.
6 - Free sulfur dioxide doesn’t have significant effect on wine quality.
7 - Total sulfur dioxide doesn’t have significant effect on wine quality.
8 - Density determines the wine quality. The data suggest that good quality wines have low density.
9 - PH determines the wine quality. The data suggest that good quality wines have low pH.
10 - Sulphates is effective in determining the wine quality. Wines with higher sulphates have high quality.
11 - Alcohol is the most important factor determining the wine quality. The data strongly suggest that the higher the alcohol content, the more likely the better wine quality.
The red wine quality dataset is highly unbalanced. Most of the wines have average quality and there are very few low quality wines. More data with low and high quality wines can improve the quality of analysis. Some chemical properties which we decide by this data analysis as having no effect on wine quality may give different results.